Genre classification using Balanced Winnow in the DEFT 2014 challenge

نویسنده

  • Eva D'hondt
چکیده

In this report we present the work done on the first subtask of the DEFT 2014 challenge which dealt with genre classification of French literary texts. In our approach we developed three types of features : lemmatized words, stylometric features and features that incorporate some form of world knowledge. Subsequent classification experiments were performed using the Balanced Winnow classifier. We submitted three different runs of which the best-scoring one combined all features. Mots-clés : catégorisation de text, DEFT, genre littéraire.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Analysis of Balanced Winnow and SVM in Large Scale Patent Categorization

This study investigates the effect of training different categorization algorithms on a corpus that is significantly larger than those reported in experiments in the literature. By means of machine learning techniques, a collection of 1.2 million patent applications is used to build a classifier that is able to classify documents with varyingly large feature spaces into the International Classi...

متن کامل

Automatic thematic classification of election manifestos

I We aim to develop a classifier which assigns themes to unseen Dutch election manifestos written after Lipschits’ work I We have to rely on the older data from the eighties and nineties for training and optimization of the classifier I System was tuned by testing on 1998 data, while using older data as training material I Balanced Winnow, implementation in the Linguistic Classification System ...

متن کامل

Text Categorization for Intellectual Property Comparing Balanced Winnow with SVM on Different Document Representations

This study investigates the effect of training different categorization algorithms on various patent document representations. The automation of knowledge and content management in the intellectual property domain has been experiencing a growing interest in the last decade, since the first patent classification system was presented in 1999 by Larkey [Larkey, 1999]. Typical applications of paten...

متن کامل

شناسایی خودکار سبک موسیقی

Nowadays, automatic analysis of music signals has gained a considerable importance due to the growing amount of music data found on the Web. Music genre classification is one of the interesting research areas in music information retrieval systems. In this paper several techniques were implemented and evaluated for music genre classification including feature extraction, feature selection and m...

متن کامل

Adaptive Learning Rate for Online Linear Discriminant Classifiers

We propose a strategy for updating the learning rate parameter of online linear classifiers for streaming data with concept drift. The change in the learning rate is guided by the change in a running estimate of the classification error. In addition, we propose an online version of the standard linear discriminant classifier (O-LDC) in which the inverse of the common covariance matrix is update...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014